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(57) Abstract 

A system backs up computer files to a remote site (12) via modem (16). Files of a user computer (24) that are found in a common 
library (20) at the remote site (12) are not copied to the remote site, whereas files not in the library (20) are copied to the remote site. 
Then, periodically the user computer (24) determines which blocks have been changed, and the user computer (24) transmits only changed 
blocks to the remote site (12). The blocks are gathered in '^hunk" files (76), and when a chunk file reaches a predetermined sire (78), it 
is transmitted to the remote site (12) for updating the back up version of the respective file. The process then resumes identifying changed 
Mocks. In addition to flagging the changed block for transfer, the process rcsynchronizes the local data file with the backed up version 
using a two-step comparison, first comparing the first two characters in the block with a pro-stored character set, and then, if the first 
comparison results in a match, comparing a digital signature of the changed block with a pro-stored signature. If either comparison results 
in a mismatch, the test is repeated using, as the first byte of the test block, the next byte in the sequence. 
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SYSTEM AND METHOD FOR BACKING UP COMPUTER FILES OVER A 
WIDE AREA COMPUTER NETWORK 

FIELD OF THE INVENTION 

The present invention relates generally to computer file back up systems, and more 
particularly to automated computer file back up systems. 

BACKGROUND 

Extraordinarily large amounts of information are stored on electronic media, such as 
personal computer hard disk drives and other well-known data storage media. Increasingly, 
the information is exclusively stored on such media; no "hard " (i.e., paper) copies exist for 
much information. It can readily be appreciated that such so-called "paperless" information 
storage systems consume less space and virtually no natural resources, in contrast to systems 
that require information to be stored on paper. Unfortunately, it is not uncommon for 
electronic data storage media to "crash", i.e., to become damaged or otherwise lose its 
information. Such information loss can be devastating to the information owner and highly 
costly. 

Accordingly, many computer system managers routinely make back up copies of 
computer files. Typically, once each day (or at some other specified periodicity) a system 
manager will cause the computer system to copy files that are resident on the hard disk drives 
of the system onto storage media such as other hard disk drives or magnetic tape. Should a 
file or entire hard disk drive in the system be damaged, lost, or otherwise rendered 
inaccessible, the back up copy of the file that is stored on, e.g., the storage tape can be 
copied bade into the system. 

Such back up systems suffer from several drawbacks. First, they usually require user 
interaction to initiate the back up process, and consequently occupy the time of a person who 
must undertake the tedious chore. Further, systems which back up computer files and store 



WO 98/27483 



PCI7US97/22065 



-2- 

the files in the same premises as the system being backed up are of little value in the event 
of a catastrophe such as a fire or flood. This is because the backed up file copies can be 
expected to perish along with the files that are resident in the system, thereby rendering the 
back up system a failure. 

An example of an alternate back up system is disclosed in U.S. Pat. No. 5,479,654, 
which teaches sending changed portions of computer files via modem to an off-site electronic 
storage facility. As taught in the '654 patent, segments of computer files are checked for 
changes that have been made since the previous back up. This check first consists of 
comparing a so-called "exclusive-OR" ("XOR") product of the file against an XOR value that 
was calculated previously for the stored back up version of the file. If the XORs do not 
agree, a file change is indicated, and the changed portion of the file is sent via modem to the 
off-site facility. 

On the other hand, if the XORs do agree, a second, more rigorous check is made to 
determine whether the segment has been changed since the last back up. The second check 
consists of comparing a cyclic redundancy check (CRC) product of the segment against a 
previously recorded CRC value. If the CRCs do not match, a file change is indicated, and 
the changed portion of the file is sent via modem to the off-site facility. Otherwise, the 
segment is assumed to have not been changed since the last back up. 

Unfortunately, the system disclosed in the '654 patent can require two calculations 
per check. Each calculation consumes computing time, and as recognized by the present 
invention, it is consequently desirable to minimize the number of calculations undertaken to 
determine whether a change has been made to a computer file. As further recognized herein, 
it is nevertheless desirable to undertake an initial "quick and dirty" check to reduce the 
number of computationally rigorous checks that must be made. 
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Moreover, as recognized by the present invention it is necessary to manage the 
transmission of data to the off-site storage facility, to avoid one user interfering with the back 
up operations of another user. StUl further, the present invention recognizes that some files 
need not initially be backed up at all. And, the present invention recognizes that file 
restoration can be provided for in more than one way to reduce the time required to restore 
lost files to a computer system. 

Accordingly, it is an object of the present invention to provide a system and method 
for backing up computer files. Another object of the present invention is to provide a system 
and method for backing up computer files to a remote facility via modem. Still another 
object of the present invention is to provide a system and method for backing up computer 
files by automatically sending only changed portions of the files to an off-site facility. Yet 
another object of the present invention is to provide a system and method for backing up 
computer files that is easy to use and cost-effective. 

SUMMARY OF THK I NVENTION 
A computer program product includes a computer program storage device that is 
readable by a digital processing system and a program means on the program storage device. 
As intended by the present invention, the program device is realized in a critical machine 
component that causes the digital processing system to perform method steps to back up at 
least some blocks in at least one local computer file from at least one user computer in a 
computer network. In accordance with the present invention, the method steps include 
transmitting the local file to a data center that is remote from the user computer to render a 
remote version, with the data center being accessible via the computer network. For each of 
the blocks (i.e., each sequence of one thousand (1000) bytes) in the local file, two respective 
characters thereof are copied, with the characters defining a respective first comparison value. 
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Additionally, the method steps include generating respective digital signature codes defining 
second comparison values. 

Moreover, the method steps include, for each block, periodically determining a test 
digital signature code and comparing it to the respective second comparison value. A block 
of the local file is designated as a transmission block to be transmitted to the data center to 
thereby update the remote version when the test digital signature code of the block does not 
equal the second comparison value of the block. 

In a preferred embodiment, the method steps further include determining whether the 
first two characters of the block, starting at a test byte,, equal one of the first comparison 
values, wherein i = a natural number, when the test digital signature code of a block does 
not equal the respective second comparison value. Furthermore, the method steps include 
determining a block digital signature code using the test bytej as the first byte of a test block 
when the first two characters of the block equal one of the first comparison values, and 
determining whether the block digital signature code equals one of the second comparison 
values. When the block digital signal code equals one of the second comparison values, the 
method returns "resynchronized". Otherwise, when either the first two characters of the 
block do not equal one of the first comparison values, or when the block digital signature 
code does not equal one of the second comparison values, the method sets i = i+1 and 
repeats the steps of this paragraph. 

The method steps may further include gathering transmission blocks in a transfer 
chunk. When the size of the transfer chunk equals a predetermined size, the transfer chunk 
is transmitted to the data center. Preferably, transfer chunks are transmitted to the data center 
only during a predetermined period. Still further, the method steps may additionally include, 
for each local file block, determining whether a duplicate of the local file block is stored in 
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a common file library at the data center. The file block is copied to the data center only 
when a duplicate of the local file block is not stored in a common file library. 

In preferred embodiments, the method steps yet further include periodically copying 
remote versions of file blocks at the data center onto a portable data storage medium. Local 
files are restored by copying remote versions of file blocks from the portable data storage 
medium when the remote versions that are stored on the portable data storage medium are at 
least as current as the remote versions stored at the data center, and otherwise they are 
restored by copying remote versions of the file blocks from the data center. 

The computer program product is also disclosed in combination with the digital 
processing apparatus, and in further combination with the computer network. A computer- 
implemented method using the above-described steps is also disclosed. 

In another aspect, a system is disclosed for backing up files in user computers. The 
system includes a data center including a library of common computer file blocks, and a 
plurality of user computers remote from the data center and in communication with the data 
center. Common library logic means determine, for each user computer, which if any of the 
file blocks of the user computer are common file blocks by virtue of being contained in the 
common library. Also, initialization logic means are associated with each user computer for 
causing file blocks except common file blocks to be transmitted to the data center. And, 
subfile incremental back up logic means are associated with each user computer for 
determining changed blocks that have been changed since a predetermined back up time. 
Changed block signals are generated in response thereto. Chunk transmission logic means 
are responsive to the subfile incremental logic means for sending only changed blocks to the 
data center in chunks having a predetermined size. As disclosed in further detail below, 
synchronizing logic means are responsive to the changed block signals for synchronizing the 
subfile incremental logic means. 
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The details of the present invention, both as to its structure ami operation, can best 
be understood in reference to the accompanying drawings, in which like reference numerals 
refer to like parts, and in which: 

BRIEF DE SCRIPTION OF THE DRAWINGS 

Figure 1 is a schematic diagram of the computer file back up system of the present 
invention; 

Figure 2 is a flow chart of the initialization process; 

Figure 3 is a flow chart of the subfile incremental back up process; and 

Figure 4 is a flow chart of the restore process. 

DET AILED DESCRIPTION OF THE PREFERRED EMBODIMENT 

Referring initially to Figure 1, a system is shown, generally designated 10, which 
includes a server computer, referred to herein as a data center 12, which is part of a 
computer network. In the preferred embodiment, the data center 12 is part of the computer 
network 14 referred to as the Internet, and the data center 12 can communicate with other 
computers on the network 14 via a modem 16. 

As shown in Figure 1, the data center 12 includes an electronic data storage 
repository 18 with an associated common library 20. A control module 22 controls the flow 
of data into and out of the repository 18 in consonance with the novel logic described below. 

Per the present invention, the common library 20 stores the blocks of computer files 
that are widely distributed and used by many users and, hence, that are common to many 
users. For example, the common library 20 can store copies of well-known off-the-shelf 
personal computer operating systems, network browsers, and so on. As intended by the 
present invention, the data storage repository 18 with common library 20 can be physically 
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implemented in a suitable medium, such as magnetic tape, hard disk arrays, compact disk 
read-only memory (CD-ROM), DASD arrays, or even floppy diskettes. 

Plural user computers 24, e.g., personal computers or laptop computers, communicate 
with the network 14 via respective user modems 26 or other appropriate network connection 
in accordance with well-known principles. In the preferred embodiment, each user computer 
24 includes a respective software-implemented back up module 28 that undertakes the 
inventive steps of the present invention. It is to be understood that in another embodiment, 
the back up modules 28 can be remote from, but accessible to, the user computers 24. 
Additionally, each user computer 24 preferably includes a respective CD-ROM drive 30 for 
purposes to be made clear shortly. As those skilled in the art will recognize, the user 
computers 24 can also be associated with respective video monitors, printers, and other output 
devices, and keyboards, keypads, mice, and other input devices. 

Figures 2-4 illustrate the structure of the back up module 30 of the present invention 
as embodied in computer program software. Those skilled in the art will appreciate that the 
Figures illustrate the structures of logic elements, such as computer program code elements 
or electronic logic circuits, that function according to this invention. Manifestly, the 
invention is practiced in its essential embodiment by a machine component that renders the 
logic elements in a form that instructs a digital processing apparatus (that is, a computer) to 
perform a sequence of function steps corresponding to those shown in the Figures. 

These instructions may reside on a program storage device including a data storage 
medium to establish a computer program product, such as a programmed computer diskette. 
Alternatively, such media can also be found in semiconductor devices, on magnetic tape, on 
optical disks, on a DASD array, on a conventional hard disk drive, on electronic read-only 
memory or on electronic random access memory, or other appropriate data storage device. 
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In an illustrative embodiment of the invention, the logic means are computer-executable 
instructions that are written in C ++ language code. 

Referring now to Figure 2, the initialization process can be understood. The process 
begins at start state 32 for each user computer 24 and moves to block 34, wherein a file index 
counter "j" is set equal to unity. Moving to decision diamond 36, it is determined whether 
a copy of the j* file, on a block-by-block basis using the so-called "message digest five" 
("MD5") code discussed further below, is already stored in the common library 20. For the 
blocks that are stored in the common library, the process moves to block 38 to flag the blocks 
as being in the common library and, hence, not candidates for wholesale transfer to the data 
center 12. Thus, it is to be understood that in the event that the j* file as it exists in the user 
computer 24 has been modified from the common, off-the-shelf version in the common 
library 20, the modified portions will be detected and transferred to the data center 12 during 
the subfile incremental procedure discussed below in reference to Figure 3. 

In any case, it can now be appreciated that the use of the common library 20 avoids 
the necessity of transmitting via modem to the data center 12 very large, off-the-shelf 
computer files that are resident on many if not most personal computer systems and that 
would otherwise require backing up. In other words, the common library 20 reduces the time 
and cost of making initial copies of the files in the user computers 24. 

Additionally, at block 38 a block-by-block comparison value listing is recorded that 
has two entries for every blockj (wherein V is a block index counter) of every filej flagged 
as being in the common library 20. The first entry is the first two characters ("XX,") of the 
first byte of each blocks In contrast, the second entry is a digital signature ( H MD5 4 "), the 
value of which is uniquely defined by the contents of the block,. The first entry establishes 
a first comparison value, and the second entry establishes a second comparison value. 
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As mentioned above, in the presently preferred embodiment, the digital signature used 
is the 128 bit-long "message digest five" ("MD5") code known in the cryptology art. As 
recognized by the present invention, the MD5 code of a block changes a great deal with even 
very small changes of the block. Thus, the digital signature of the present invention 
advantageously is more sensitive to changes in a block than are check sums ( cyclic 
redundancy codes, ami so-called "exclusive or* codes. 

From block 38, the logic proceeds to decision diamond 40 to determine whether the 
last file in the user computer 24 has been tested. If so, the initialization process terminates 
at state 42. Otherwise, the process moves to block 44 to increment the file index counter "j" 
by one, and then to loop back to decision diamond 36. 

In contrast, if it is determined at decision diamond 36 that a copy of the i* block of 
the j* file is not in the common library 20, the logic proceeds to block 46 to trickle copy each 
non-resident blockj of the file^ to the storage repository 18 of the data center 12. By "trickle 
copy" is meant that the non-resident blocks of the file are transmitted to the data center 12 
during a predetermined transmission period, also referred to herein as a "window", until the 
entire non-resident portions file has been transmitted, or until the transmission period elapses. 

If the transmission period elapses before the non-resident portion of the file has been 
transmitted in its entirety, the respective back up module 28 notes the value of the block index 
counter "i", and stops transmitting the file^. Then, at the start of the next transmission 
period, the back up module 28 increments the block index counter "i" by unity and 
commences transmitting the file, where it previously had left off. As indicated in Figure 2, 
the block-by-block listing of XX, and MD5 } is generated at block 46 as described previously 
in reference to block 38. From block 46, the logic proceeds to decision diamond 40 to 
function as previously described. 
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Turning now to Figure 3, the subfile incremental back up logic can be seen. It is to 
be understood that as intended by the present invention, periodically (e.g., on a daily basis) 
the back up module 28 of each user computer 24 will undertake the process discussed below. 

Commencing at decision diamond 48, it is determined whether a user-selected 
predetermined back up period has commenced. When it has, the process moves to block 50 
to enter a do loop for each file, that has been designated by the user to be backed up. Then, 
the process moves to block 52 to enter a do loop for each block, of the file^ that was selected 
at block 50. 

Moving to block 54, the back up module 28 computes the digital signal MD5™ of 
the blocki under test. Next, the digital signal MD5,»~ 0 f the block, under test is compared, 
at decision diamond 56, with the digital signal MD5,<* that is recorded in the listing that was 
generated in Figure 2, as described above. 

When the digital signatures are equal, indicating that no change was made to the 
block, under test since the last back up, the process moves to decision diamond 58. At 
decision diamond 58, the logic determines whether the last block of the file, has been tested, 
and if not, the process loops back to retrieve the next block, and compute its digital signature 
as shown and described. In other words, if further blocks exist for the file, under test, the 
index counter "i" is incremented by one, and the process loops back to block 54. 

If, on the other hand, the last block of the file, has been tested, the logic determines, 
at decision diamond 60, whether the last file in the user computer 24 has been tested. If not, 
the process loops back to retrieve the next file and proceed as before to test the blocks in the 
file. Otherwise, the process moves to decision diamond 62. 

At decision diamond 62, it is determined whether the back up period has not elapsed, 
i.e., whether the allocated back up window is still open. If the period has elapsed, the logic 
loops back to decision diamond 48 to await the commencement of the next allocated back up 
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period. In contrast, if the back up window is still open, the logic attempts to establish 
communication with the data center 12 via the modems 26, 16. 

At decision diamond 64, it is determined whether the communication connection was 
successful. If it wasn't, the logic moves to block 66 to wait a predetermined time-out period 
(preferably five minutes) before looping back to decision diamond 62 to retry the connection 
if the window is still open. In any case, when it is determined that communication has been 
established between the user computer 24 and the data center 12 subsequent to testing all files, 
in the user computer 12, the logic encrypts and compresses any remaining updated blocks 
(described further below) and transmits them to the data center 12, at block 68. From block 
68, the process loops back to decision diamond 48. 

Returning to the negative loop originating at decision diamond 56, when the digital 
signature of the blocks does not match the signature stored in the listing for the block, a 
change to the block, is indicated, and the block, therefore becomes a candidate for back up. 
Additionally, the process resynchronizes itself with the block-by-block comparison value 
listing discussed above. 

Accordingly, after a negative test at decision diamond 56 (i.e., when the logic 
essentially returns a "changed block" signal), the logic moves to block 70, wherein a do loop 
is entered for the bytes,; of the block,, "k* is an index counter initially equal to unity. 
Moving to decision diamond 72, the logic determines whether the first two characters 
("XX k ~»") of the block starting with the byte* under test equal the first comparison value 
("XX,-*") that corresponds to the block, in the comparison value listing. If not, the logic 
moves to decision diamond 73 to determine whether the current byte* is the last byte of the 
candidate block under test. Stated differently, at decision diamond 73 the logic determines 
whether the byte index counter "k" equals one thousand. If it does, the logic moves to block 
74 to increment the block index counter "i" by unity, and then the logic loops back to block 
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70 with the byte index counter "k" reset to unity. Thus, the next thousand comparisons at 
decision diamond 72 accordingly will use, as the first comparison value, the first two 
characters of the new "i*" block. If it is determined that the byte index counter "V does not 
equal one thousand at decision diamond 73, the logic loops back to increment "k" by unity 
and test the next byte in the block, at decision diamond 72. 

If, however, it is determined at decision diamond 72 that "XX k ™* equals m XX{*\ 
the logic proceeds to decision diamond 75. At decision diamond 75, the digital signature of 
the block having as its first byte the byte,, under test is determined and compared to the 
second comparison value ( H MD5^ n ) that corresponds to the block, in the comparison value 
listing. If not, the logic moves to decision diamond 73. 

If the digital signature of the block having as its first byte the bytek under test is 
determined to be equal to the second comparison value ("MDSfW") at decision diamond 75, 
the logic returns "resynchronized" and moves to block 76. In other words, a positive test at 
decision diamond 75 indicates that the logic has found an old, unchanged block that 
previously has been backed up, and, hence, that the logic is resynchronized with the 
comparison value listing. 

At block 76, the changed block(s) (also referred to herein as "transmission blocks") 
are moved to a "next chunk" file. Additionally, at block 76 the comparison value listing is 
updated to include the first two characters and digital signatures of the changed b!ock(s), for 
use as the first and second comparison values, respectively, during the test of the blocks 
during the next back up cycle. Moving to decision diamond 78, it is determined whether the 
chunk file is full. In the presently preferred embodiment, the chunk file is full when its size 
is five megabytes (5 MB). 

If the chunk file is not full, the logic returns to decision diamond 58. In contrast, if 
the chunk file is full, the process moves to decision diamond 80 to determine whether the 
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back up period has not elapsed, i.e., whether the allocated back up window is still open. If 
the period has elapsed, the logic loops back to decision diamond 48 to await the 
commencement of the next allocated back up period. In contrast, if the back up window is 
still open, the logic transmits the chunk when a successful connection has been established 
with the data center 12, using the procedure of steps 64 and 66 discussed above. From block 
82, the logic returns to decision diamond 58. 

Figure 4 shows the logic by which lost files may be restored to a user computer 24. 
It is to be understood that as envisioned herein, backed up file blocks at the data center 12 
are periodically (e.g., weekly, monthly, quarterly, etc.) copied from the storage repository 
18 to CD-ROM disks, and the disks then transported to the same physical location as the 
associated user computer 24. Hereinafter, these CD-ROM disks are referred to as "local back 
up disks". 

At block 84, a directory of blocks in the date center 12 that correspond to the user 
computer 24 is downloaded from the date center 12. Next, at block 86 a do loop is entered 
for each block, of the lost file(s) as follows. It is determined at decision diamond 88 whether 
the requested version of the block of the file (ordinarily the latest version) is on the local back 
up disks. If so, the block is preferentially restored from the local back up disks at block 90. 
Otherwise, it is restored via the modems 26, 16 from the data center 12 at block 92. From 
block 90 or block 92, the logic loops back to retrieve the next block,, and continues this 
process until the entire requested files have been restored. 

While the particular SYSTEM AND METHOD FOR BACKING UP COMPUTER 
PILES OVER A WIDE AREA COMPUTER NETWORK as herein shown and described in 
detail is fully capable of attaining the above-described objects of the invention, it is to be 
understood that it is the presently preferred embodiment of the present invention and is thus 
representative of the subject matter which is broadly contemplated by the present invention, 
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that the scope of the present invention fully encompasses other embodiments which may 
become obvious to those skilled in the art, and that the scope of the present invention is 
accordingly to be limited by nothing other than the appended claims. 
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WHAT IS CLAIMED IS: 

1 . A computer program product comprising : 

a computer program storage device readable by a digital processing system; and 
logic means on the program storage device and including instructions executable by 
the digital processing system for performing method steps for backing up at least some blocks 
in at least one local computer file from at least one user computer in a computer network, the 
method steps comprising: 

(a) transmitting the local file to a data center remote from the user 
computer to render a remote version, the data center being accessible via the 
computer network; 

(b) for each of the blocks in the local file, copying two respective 
characters thereof defining respective first comparison values and generating 
respective digital signature codes defining second comparison values; 

(c) for each block, periodically determining a test digital signature code 
and comparing it to the respective second comparison value; and 

(d) designating a block of the local file as a transmission block to be 
transmitted to the data center to thereby update the remote version when the test 
digital signature code of the block does not equal the second comparison value of the 
block. 

2. The computer program product of Claim 1 , wherein the method steps further 
comprise: 

(e) when the test digital signature code of a block does not equal the 
respective second comparison value, determining whether the first two characters of 
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a block, starting with a test byte, of the block, equal one of the first comparison 
values, wherein i = a natural number; 

(0 when the first two characters equal one of the first comparison values , 
determining a block digital signature code using the test byte, as the first byte of a test 
block; 

(g) determining whether the block digital signature code equals one of the 
second comparison values; 

(h) when the block digital signal code equals one of the second 
comparison values, returning "resynchronized"; otherwise 

(i) when either the first two characters do not equal one of the first 
comparison values, or when the block digital signature code does not equal one of the 
second comparison values, setting i = i+1 and repeating steps (eMO. 

3. The computer program product of Claim 2. wherein the method steps further 
comprise: 

gathering transmission blocks in a transfer chunk; and 
when the size of the transfer chunk equals a predetermined size, transmitting 
the transfer chunk to the data center. 

4. The computer program product of Claim 3, wherein transfer chunks are 
transmitted to the data center only during a predetermined period. 



5. The computer program product of Claim 4, wherein the method steps further 
comprise: 
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for each local file block, determining whether a duplicate of the local file 
block is stored in a common file block library at the data center; and 

undertaking step (a) only when a duplicate of the local file block is not stored 
in a common file block library. 



6. The computer program product of Claim 5, wherein the method steps further 
comprise: 

periodically copying remote versions at the data center onto a portable data 
storage medium; and 

restoring local files by copying remote versions from the portable data storage 
medium when the remote versions stored on the portable data storage medium are at 
least as current as the remote versions stored at the data center, and otherwise 
restoring local files by copying remote versions from the data center. 

7. The computer program product of Claim 2, in combination with the digital 
processing apparatus. 

The combination of Claim 7, in further combination with the computer 

A system for backing up files in user computers, comprising: 
a data center including a library of common computer files; 
a plurality of user computers remote from the data center and in 
communication with the data center; 



8. 

network. 
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common library logic means for determining, for each user computer, which 
if any of the file blocks of the user computer are common file blocks by virtue of 
being contained in the common library; 

initialization logic means associated with each user computer for causing file 
blocks except common file blocks to be transmitted to the data center; 

subfile incremental back up logic means associated with each user computer 
for determining changed blocks that have been changed since a predetermined back 
up time and generating changed block signals in response thereto; 

chunk transmission logic means responsive to the subfile incremental logic 
means for sending only changed blocks to the data center in chunks having a 
predetermined sire; and 

synchronizing logic means responsive to the changed block signals for 
synchronizing the subfile incremental logic means. 

10. The system of Claim 9, further comprising logic means for generating a 
listing of the blocks of the files stored at the data center and, associated with each block, the 
first two characters of the block and a digital signature of the block, and the subfile 
incremental logic means uses the listing to undertake the determining function. 

11. The system of Claim 10, wherein the synchronizing logic means uses the 
listing to undertake the synchronizing function. 

12. The system of Claim 1 1 , further comprising: 

logic means for periodically copying remote versions at the data center onto 
a portable data storage medium; and 
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logic means for restoring local files by copying remote versions from the 
portable data storage medium when the remote versions stored on the portable data 
storage medium are at least as current as the remote versions stored at the data 
center, and otherwise restoring local files by copying remote versions from the data 
center. 

13. A computer-implemented for backing up at least one local computer file from 
at least one user computer in a computer network, comprising: 

(a) transmitting the local file to a data center remote from the user 
computer to render a remote version, the data center being accessible via the 
computer network; 

(b) for each of the blocks in the local file, copying two respective 
characters thereof defining respective first comparison values and generating 
respective digital signature codes defining second comparison values; 

(c) for each block, periodically determining a test digital signature code 
and comparing it to the respective second comparison value; and 

(d) designating a block of the local file as a transmission block to be 
transmitted to the data center to thereby update the remote version when the test 
digital signature code of the block does not equal the second comparison value of the 
block. 

14. The computer-implemented method of Claim 13, farther comprising: 

(e) when the test digital signature code of a block does not equal the 
respective second comparison value, determining whether the first two characters of 
the block equal one of the first comparison values; 



WO 98/27483 



PCT/US97/22065 



-20- 

(0 when the first two characters equal one of the first comparison values, 
determining a block digital signature code using a test byte, as the first byte of a test 
block; 

(g) deteimining whether the block digital signature code equals one of the 
second comparison values; 

(h) when the block digital signal code equals one of the second 
comparison values, returning "resynchronized"; otherwise 

(i) when either the first two characters of the test byte, do not equal one 
of the first comparison values, or when the block digital signature code does not 
equal one of the second comparison values, setting i = i+ 1 and repeating steps (e)- 
©. 

15. The computer-implemented method of Claim 14, further comprising: 
gathering transmission blocks in a transfer chunk; and 

when the size of the transfer chunk equals a predetermined size, transmitting 
the transfer chunk to the data center. 

16. The computer-implemented method of Claim 15, wherein transfer chunks are 
transmitted to the data center only during a predetermined period. 

17. The computer-implemented method of Claim 16, further comprising: 

for at least some of the local file block, determining whether a duplicate of 
the local file block is stored in a common file block library at the data center; and 

undertaking step (a) only when a duplicate of the local file block is not stored 
in a common file block library. 
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18. The computer-implemented method of Claim 17, further comprising: 

periodically copying remote versions at the data center onto a portable data 
storage medium; and 

restoring local file blocks by copying remote versions from the portable data 
storage medium when the remote versions stored on the portable data storage medium 
are at least as current as the remote versions stored at the data center, and otherwise 
restoring local file blocks by copying remote versions from the data center. 
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